A Parameter-Free Classification Method for Large Scale Learning

نویسنده

  • Marc Boullé
چکیده

With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper, we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with data sets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Influences of Small-Scale Effect and Boundary Conditions on the Free Vibration of Nano-Plates: A Molecular Dynamics Simulation

This paper addresses the influence of boundary conditions and small-scale effect on the free vibration of nano-plates using molecular dynamics (MD) and nonlocal elasticity theory. Based on the MD simulations, Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is used to obtain fundamental frequencies of single layered graphene sheets (SLGSs) which modeled in this paper as the mo...

متن کامل

An Efficient Parameter-Free Method for Large Scale Offline Learning

With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper we present a ...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

A Three-terms Conjugate Gradient Algorithm for Solving Large-Scale Systems of Nonlinear Equations

Nonlinear conjugate gradient method is well known in solving large-scale unconstrained optimization problems due to it’s low storage requirement and simple to implement. Research activities on it’s application to handle higher dimensional systems of nonlinear equations are just beginning. This paper presents a Threeterm Conjugate Gradient algorithm for solving Large-Scale systems of nonlinear e...

متن کامل

Developing a New Method in Object Based Classification to Updating Large Scale Maps with Emphasis on Building Feature

According to the cities expansion, updating urban maps for urban planning is important and its effectiveness is depend on the information extraction / change detection accuracy. Information extraction methods are divided into two groups, including Pixel-Based (PB) and Object-Based (OB). OB analysis has overcome the limitations of PB analysis (producing salt-pepper results and features with hole...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2009